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Abstract —Load disaggregation techniques infer the operation 
of different power consuming devices from a single measurement 
point that records the total power draw over time. Thus, a 
device consuming power at the moment can be understood 
as information encoded in the power draw. However, similar 
power draws or similar combinations of power draws limit the 
ability to detect the currently active device set. We present an 
information coding perspective of load disaggregation to enable 
a better understanding of this process and to support its future 
improvement. In typical cases of quantity and type of devices 
and their respective power consumption, not all possible device 
configurations can be mapped to distinguishable power values. 
We introduce the term of proficiency to describe the suitability of 
a device set for load disaggregation. We provide the notion and 
calculation of entropy of initial device states, mutual information 
of power values and the resulting uncertainty coefficient or 
proficiency. We show that the proficiency is highly dependent 
from the device running probability especially for devices with 
multiple states of power consumption. The application of the 
concept is demonstrated by exemplary artificial data as well as 
with actual power consumption data from real-world power draw 
datasets. 

Keywords: load disaggregation, smart metering, informa¬ 
tion theory 


I. Introduction 

There are several reasons why it is beneficial for a power 
grid to get as much information as possible in order to 
accomplish monitoring and controlling purposes, like giving 
consumption feedback, or detecting devices with high energy 
consumption. To avoid additional costs on hardware, installa¬ 
tion and operation, it is highly valuable to derive this infor¬ 
mation from few, if not a single, measurement point(s). Load 
disaggregation or Non Intrusive Load Monitoring (NILM) is a 
technique used for reasoning about the operation of power con¬ 
suming devices from a single measurement point recording the 
total power draw. One of its promising applications is the field 
of metering within smart homes m, where information about 
single appliance usage is of high interest but monitoring with 
many sensors is not an option. Low cost power monitoring on 
device level is one step in integration of residential buildings 
into the future smart grid, which is considered to be a key- 
technology for carbon dioxide reduction. NILM works based 
on information about the involved devices and permissible 
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assumptions on usage scenarios. Replicable, as power consum¬ 
ing devices unintentionally encode information into the total 
power draw. Load disaggregation algorithms identify attributes 
within the measured data and draw meaningful conclusions 
about the overall consumption scenario. 

Load disaggregation that works exclusively based on (ac¬ 
tive) power values is of high interest because active power is 
simple to measure and existing metering infrastructure usually 
provide the necessary values. With the upcoming smart meters 
accessing the data gets even easier. However, a main drawback 
is that devices with similar consumption characteristics are 
hard to distinguish and simultaneously running devices add up 
in power values. As a consequence the search space of possible 
power values at least doubles in size with each additional 
device. Devices with multiple values of power consumption 
additionally complicate the task. A single power value can be 
either caused by different devices with similar characteristics 
or by aggregation of multiple less consuming devices. Which 
is why the distinction of all possible scenarios by using 
exclusively power values is difficult. 

Within this paper we discuss the problem of indistinguish¬ 
able power values caused by different device configurations. 
We use concepts of information theory to quantize the problem 
for a given device set by introducing the concept of proficiency 
for load disaggregation. It allows to compare the extent of the 
problem for different device sets more objectively. We do so 
by using real data of different measurement campaigns and 
houses with multi-state devices. We further investigate how 
proficiency is influenced by statistical operation probabilities 
of single devices and outline how the insights are useful for 
improvements of future NILM algorithms. 

The goal of this work is to better understand the mapping 
of device configuration scenarios to power values. We identify 
this as a coding procedure for information communication. 
This knowledge is helpful for further improvement of load 
disaggregation, which is decoding in that context. The basic 
problem is related to measurement accuracy but has a different 
root. The two problems can be clearly separated that is why 
we use exclusively information theory for discrete sources 
and combinatorics. In that sense we complement other work 
on limits of NILM by measurement accuracy as well as on 
quantification of disaggregation complexity for appliance sets. 

Section [II] is dedicated to explain NILM as an information 
communication problem. Within section III the concepts of 
information theory are applied to the case of aggregated 
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power values. Two exemplary device sets are introduced which 
contain solely on-off devices. By section IV these concepts 
are extended to the more general case of devices with multiply 
power consumption values. In section[V]we apply the concepts 
to nine different appliance sets and compare them among 
themselves before we discuss the results and sketch possible 
follow up work in section [Vi] In the last section we summarize 
the results. 


II. Load Disaggregation within Information 
Communication 

Figure |T] shows a scheme of information communication 
applied to load disaggregation according to how it was intro¬ 
duced in El by Hart. It identifies Load Disaggregation as a 
decoding problem in the context of information communica¬ 
tion theory. Load monitoring benefits from being non intrusive 
which means that any installation or device marking system is 
avoided. The primary source of information is the appliance 
usage which causes power consumption. As the main purpose 
of a power cable is power supply, the utilized information 
content is produced unintentionally. The meaningful decoding 
of a signal stream on the power cable is the challenge of load 
disaggregation. The code, which is the mapping of the usage 
scenarios to the power line signals, is exclusively defined by 
the devices and their attributes. There are various attributes that 
enable identification of a specific device by its fingerprint on 
the power cable. Frequency and non-harmonic device feedback 
on the input current is rich on information but the required 
high resolution measurement is usually costly and transmission 
functions of the power line circuits and their influences are 
not known. To overcome this and for additional arguments 
provided by This is one reason why Hart E] recommends 
usage of so called steady state attributes like power values for 
device detection. 

CODING DECODING 



Fig. 1. Load disaggregation is the decoding procedure in an information 
communication process. 


Several research has been done on the process shown in 
figure [I] Dong analyzes in m limits for scenario detection 
due to measurement accuracy. A successful and efficient 
detection of the desired scenario requests the different parts 
to be well coordinated. The applications differ significantly 
concerning the acceptable effort on accuracy, maintenance, 
installation, computing power, measurement and finally costs. 
There are examples that very high measurement rates enable 
distinguishing quite specific scenarios, e. g., which channel is 


watched on TV. But for most applications a very high data 
volume causes more burden than benefit. 

Current approaches solving the load disaggregation problem 
can be divided into between supervised and unsupervised ap¬ 
proaches. A good overview on supervised approaches is given 
in a and Q. The supervised approach needs a labeled data 
set to train a classifier and can be divided into optimization and 
pattern recognition (6) based algorithms. In the optimization 
based approaches, the problem of aggregated power profiles 
is modelled as an optimization problem. The total power con¬ 
sumption and a database of known power profiles of appliances 
are given. With this knowledge, a random composition of 
database power profiles is selected to estimate the total power 
consumption with minimal error 0, HD- In pattern recognition 
approaches, proposed methods can be divided into clustering 
approaches |2|, neural networks algorithms |[9], HOj , ifTTI 
and support vector machines based algorithms IH, ED- The 
disadvantage of the supervised classification approaches is the 
requirement of a priori information. 

Accordingly, recent research is more concerned with un¬ 
supervised algorithms, which are using unlabeled data. Un¬ 
supervised algorithms do not require any training data and 
therefore no a priori information of the system. Current 
approaches are based on dynamic time warping El, clustering 
with blind source separation im, Hidden Markov Models 
(HMM) IH, |T3, 01- Fractorial HMM Q9] other varia¬ 
tions of HMM l20l . 1211 . temporal motif mining ll22l and 
blind source separation ll23l . For all of these approaches the 
distinction between appliances is unsupervised whereas the 
labeling of a model with the corresponding appliance is not 
done automatically. Approaches performing automatic labeling 
are conducted based on Bayesian inference m and semi- 
supervised classification li25l . 

The device states (specifically their values for power con¬ 
sumption) define the set of all possible device configurations, 
i.e., the state space. The usage is unknown and generates the 
aggregated power draw. Usage is dependent from the device 
operators and the build-in programs which make devices 
change their power consumption. The usage maps the possible 
device states to power-values. Load disaggregation is reversing 
what usage does: The power profile constitutes input and the 
current device states can be derived. The mapping of device 
states to power values is a coding process. The code depends 
on the power values of the device states, exclusively. There 
is no guarantee that this code is uniquely decodable and 
possibilities to modify that code are limited. 

Additional difficulties arise in the practice of load disaggre¬ 
gation, e. g., measurement resolution or noise, are not consid¬ 
ered within this work. The theoretical constraints demonstrated 
within this paper arise for an idealized case where integer 
power values characterize the device states. The explicit 
inclusion of measurement accuracy is on the one hand not 
necessary to demonstrate what we aim for and on the other 
hand offers no solution of the problem. The basic concepts 
are elucidated with on-off devices are extended to multi-state 
devices, subsequently. Correlation between different states and 
time durations are not taken into account. 
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III. The state space of an appliance set 

Within this section only on-off devices with only one single 
value of power consumption P d are considered. The set of 
devices and so the set of power values 

P D :{P U ...,P N } 

is known. We define the order of the device set in a way that 
Pd < Pd+i- Without any additional knowledge it is possible 
to calculate all the possible power values P \ by aggregation. 
The state number k specifies the subset of devices which is 
turned on and the complementary subset which is turned off. 
The first state is defined as the power value Pi = 0 and the 
last states power value is the sum of all single devices 

N 

Pm = Ptotal = XI Pd (1) 

d= 1 

which can be used to characterize the device set. In between 


k 

z 

n z 


1 

2...N+1 



M-N...M-1 

M 

0 

1 

2 

N-2 

N-l 

N 

1 

N 

(?) - 

( N ) 

\N- 2/ 

N 

1 


-TABLE I- 

The table enumerates the M power states, the number of 

TURNED ON APPLIANCES 2 AND n z , THE NUMBER OF DIFFERENT STATES 
WITH THE SAME 2 . 


these particular cases are always (?) cases where z out of the 
N devices are turned on. The total number of possible states 
results to 

-ICH" 

which is equal to the possible states of a binary word of length 
N. In the context of load disaggregation some of those states 
are very unlikely, even practicable impossible, to occur. But 
a priori , without knowing anything about the source and the 
emitted load profile it is impossible to detect which ones are 
more likely to occur. 

How these M states map to power values depends on the 
properties of the device set, i.e., the single device power 
values. The power value Pj. of a specific state k is calculated 
by 

N 

Pk = E S *d p d (3) 

d= 1 

where Skd is the state matrix that contains a vector for each 
state k that holds a 1 for turned-on and a 0 for tumed-off 
devices. Repetition for all the states leads to the set of possible 
aggregated power values. 

Further we refer to two exemplary device sets, each con¬ 
taining ten on-off devices. The device set A has a linear power 
spectrum, in the sense that 

P d = P d -1 + Pa (4) 

where we use Pi = Pa = 5W. Device set B contains 
the power values P 13 : {1,2,3,5,8,14,24,41,69,117}. That 
power spectrum can be approximated by 

(5) 



100 


(V 
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d d 

Fig. 2. The power values of device set A follows Equation [4] and has 
Ptotal = 275. The set B has Ptotal = 284 and single device values according 
to Equation [5] 


for a = 1.69 and Pi = 1W and therefore is of power law type. 
Additionally these two sets have comparable total power of 
275 and 284 Watt, which are the same magnitude. 

A load profile is a stream of power values Pi of length n. 
The total consumed energy is 


E=Y, p i A * ( 6 ) 

i= 1 


where At is the sampling time and the power values Pi are 
averaged within a sampling duration. The average power of a 
load profile is 


P 


E 

nAt 


(7) 


The power values Pi result from the aggregation of power 
values of turned-on devices at time step i so that 


N 


P(*) = £P S P S (*) 


s= 1 


( 8 ) 


where N is the number of all devices and S° n (i ) is a boolean 
state function which is 1 when the device s is operated at the 
time i and 0 otherwise. 


A. Equal state probability 

When there is no knowledge of a source available it is com¬ 
mon in information theory to assume the maximum entropy 
case, which means equal likelihood for all possible source 
symbols. All of the state probabilities pk have the same value 
of 1 /M and the entropy of the source, which is defined as 

M 

H = - Y, Pkld(p k ) , (9) 

fc =i 

has its highest possible value of H max = ld(M). The binary 
logarithm log 2 is written as Id. H max is an upper bound for 
the entropy of a discrete memory-less, time-invariant source 
(DMS). The entropy H max of a load-source depends on the 
number of states or devices. As a first step it would allow to 
compare the difficulty of load disaggregation problems with 
different numbers of devices. Furthermore it is an upper bound 
for entropy of any load profile from this source. An equal 
distribution of power states results in equal average run-time 
for each single device. Table [I] shows that there are as many 


Pd ~ aPd-i 
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states with one device on as there are with one device off. 
This leads to the conclusion that if all the M states are 
hypothetically visited one time each device is running exactly 
M/2 times, which is half of the total duration. Therefore we 
get the correlation 

1 M 1 

M Pk = 2 Ptotal (10) 

between the average state power and the aggregated power of 
the device set P to tai- 

Counting the number of states with a specific power value 
Pfc gives a power state occupation number c. It can be written 
using the Dirac delta function as 

M OO 

C(P)=£ / 5{P k -P)dP . (11) 

k =1 J0 

An occupation number above one reflects the challenge of 
distinguishing between different states consuming the same 
power. States with this power value are not uniquely distin¬ 
guishable. Figure [3] shows the occupation numbers for the 
exemplary device sets which have the same total number of 
states. For set A there are up to forty states that map to the 




Fig. 3. Occupation numbers for set A and set B for all power values. The 
colors stand for the number of involved devices starting from z = 1 in blue 
to 2 : = 10 in dark red. 

same power value while there are up to eight in set B. In set 
A a majority of power values (gray color) is not used at all. 
Load disaggregation is therefore expected to be more difficult 
for set A than for set B. 

The power values in figure [3] represent all states of a 
space between zero and Ptotai which are available to encode 
the primary information. In that sense power values are the 
channel within a theoretical communication setup where the 
device states are communicated. Otherwise as for the classical 
coding problem in communication theory the coding scheme 
is fixed and can not be designed according to the channel 
transmission function. The power values can be seen as the 
information source of the receiver side. In this context its 
entropy is the mutual information of the power value set I p 
and is calculated by 

I p = - P Zp(Pj)ld(p(P j )) ■ (12) 

Pj =o 

We assume the power values Pj to be a discrete set between 
0 and Ptotai but the definition can be extended to continuous 


probability density functions. For equal state likely hood the 
power value probability is calculated by c(Pj ) so that we can 
define 



which is the transported information by power values for 
the maximal entropy case. Note that it is not the theoretical 
maximum of transportable information by these power values. 
Therefore it needs the averaged power state occupation number 

c=(c{P 0 )) . (14) 

In average each of the H- power states is occurring with 
probability of which can be used further to approximate 
the mutual Information by 

I max — Hmax ~ ld(c ) 


When the mutual information is smaller than the entropy of 
the source it means that not all information can be transmitted 
and therefore the stream can not be decoded completely. As a 
measure for that loss of information we suggest the uncertainty 
coefficient or proficiency which is defined in information 
theory |26j as 



(15) 


and is shown to be a meaningful performance matrix by [27]. 
We name the proficiency for the maximal entropy case C max 


C„ 


Il„ 


< 1 - 


ld(c ) 

dirrmT 


which is restricted by an upper bound using the average 
occupation number. 

Table [II] shows the developed information measures for the 
exemplary device sets A and B. 



I p 

-‘-max 

Cmax 

c 

Set 

A 

5.33 

0.53 

18.3 

Set 

B 

8.04 

0.80 

3.6 


-table n- 

The developed measures of average information for to the 

DEVICE SETS A AND B. 


For another hypothetical device set B2, that is similar to B 
with a = 2 and N = 10, the occupation number is 1 for all 
the power values as shown in figure [6] (just like the binary 
representation of natural numbers). An equal probability in 
state space maps to equal distribution of power values with 
c = 1 which makes the mutual information reach the value of 
Hmax■ It requires the power values space to be at least as big 
as the device state space to enable unique decoding. It means 
that only in the case H max = I p full load disaggregation by 
exclusive use of power values is possible. The proficiency and 
the averaged occupation number are both one in this case. 
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B. Equal device probability 

From the point of load disaggregation it is more suitable 
to deal with device probabilities than with probabilities for 
the combined power states. It is easier to relate devices 
to different user scenarios than to power states. The user- 
dependent devices follow behavioral patterns, e. g., starting 
coffee machine after getting up. Automatic devices (like a 
fridge) are turned on regularly and therefore form a major 
part of the base load in a power draw. For many types of 
devices characteristic operation probabilities can be estimated 
ED- Even though their occurrence can vary, most of them are 
more likely to be switched off. For sizing of power lines (in a 
household) utilization factors are standard in engineering. The 
reasonable assumption is, that not all devices (or plugs) are 
used simultaneously which allows installation of power lines 
with smaller cross-section, which is more economic. Power 
factors are around 0.5 for households, little higher for industry 
or commercial installations and they are expected to contain 
a safety buffer. 

However, the state probabilities p k can be easily estimated 
in case the single device operation probabilities p d are known. 
From equation [3] for the state power the calculation of the state 
probability pk can be derived as 

SkdPd + 

d=l \ 

assuming that the devices are statistically independent. Spe¬ 
cific device probabilities do not fit the maximum entropy 
assumption. But as there is no a prior knowledge on p d we 
use the expectancy value 


Pk = U 


(1 - S kd )(l-p d ) 


(16) 


P = (Pd) 


(17) 


for each device to demonstrate how the device sets entropy is 
influenced. A posterior the average probability p for running 
any device can be calculated by 


E 

P = -T— 

PtotainM 


(18) 


using the energy E of a load profile of length n. In case 
the single devices run-times n d are known even the device 
operation probability can be estimated by 


The average device probability is used to get 

p k (z)=p zW (l-p) N - zW (19) 


which is the state probability of a state with z turned on 
devices. It is a logarithmic function as shown on the left hand 
side of figure [4] for a set of ten devices. The state M, with all 
devices on, has the probability p N and state 1 has (1 - p) N , 
respectively. The figure depicts the entropy 

h{z) = Pk(z)ln(pk(z )) 


on the right hand side which is an intermediate result when 
calculating the total source entropy H = h(z). In accor¬ 

dance with equation 10 the state probability is constant for 


10 ° 



a. 


lO-to 



2 2 


Fig. 4. The single state probabilities Pk(z) (for z turned on devices) 
depend on the (averaged) device operation probabilities. The entropy h(z) 
is additionally determined by the number of states. 


p 

0.1 

0.3 

0.5 

0.7 

0.9 

H 

4.69 

8.81 

10 

8.81 

4.69 


-tablh in- 

Total source entropy H for different average device 

PROBABILITIES p . 


the device probability of p - 0.5. The total source entropy, 
which is shown in Table III then reaches H max . The entropy 
function h(z) is symmetric with respect to p which means the 
total entropy for the operation probability of 0.1 is the same 
as for the probability 0.1 to be turned off. 

The impact of device probabilities on the entropy propagates 
to power values, i. e., mutual information and proficiency. The 
calculation of the power value probabilities 


M r 

p{ p ) = Y,Pk \ 

fc =1 Jo 


6(P k - P)dP 


( 20 ) 


requires consideration of the state probability instead of merely 
the occupation number c(P). This is used to calculate the 
mutual information 

P P 

£ p(p 3 )id(p(p 3 )) = £' h'\p 3 ) 


( 21 ) 


Pi= o 


P;= 0 


of different single device probabilities. The function h 1 '( Pj ) 
is shown in figure [5] for the exemplary device sets A and 
B. The three different values of p in figure [3] are used for 
calculating the mutual information I p and proficiency C in 
table IV Even though the mutual information for p = 0.5 is 



P 

0.1 

0.3 

0.5 

P 

0.1 

0.3 

0.5 

Set A 

T P 

3.70 

5.14 

5.33 

C 

0.79 

0.58 

0.53 

Set B 


4.50 

7.51 

8.04 

0.96 

0.85 

0.80 


---IABEE IV- 

Mutual Information I p and Proficiency C of the device sets A 

AND B ACCORDING TO FIGURE^ 


higher, the proficiency and therefore the expected accuracy 
for disaggregation is lower than for p - 0.1. We conclude that 
equal probability for operation of all single devices does not 
lead to equal occurrence of the states or power values. 
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Fig. 5. The power value probabilities determine the entropy function h p for 
the power states. We show it for set A and B for three different averaged 
device probabilities p. 


IV. Multi-State devices 

Multi-state appliances complicate the description of the 
power values for a set of devices. Power consumption for a 
device d is specified by a vector with an entry for all its s d 
power values. A device set with N devices is, for instance, 
defined by 

P D : {(Pi 1 , Pi); (P 2 , p|, P 2 3 ) ; (P 3 ) ;...; (P£,..., P*?)} . 

The second device has three power values s 2 = 3, which means 
the device has four possible states. Device three is an on-off 
device with one power value. The total number of power values 

N 

S=Y, § d (22) 

d= 1 

is a characteristic parameter for a device set (in case of 
exclusive on-off devices S = N). We assume that all power 
values are increasing in order to assure a unique description 
for a specific device set. The highest power value of a device 
P d defines the order within the device set so that P d < P d+1 . 
The power values of a single device are sorted that P d < P d +1 . 

Like in the case of simple devices the number of possible 
states is calculated by multiplication of the number of states 
for all the N devices 

N 

M=Y\(s d + l) . ( 23 ) 

d= 1 

The M states map to the power values Pk which can be 
calculated by 

N 

P k=Y, P P • ( 24 -) 

d= 1 

using a different notation than in equation [3] The state matrix 
element Skd contains the power state of the device d associated 
with state k, which is in accordance with its earlier usage. 
Now, Skd is used as an index not as an exponent, so P d 
is the power value of device d associated with state k. This 
notation requires the additional definition of P° for all devices 
in a way that 

P d ° = 0 V deN . 

The mapping of the state number k to the device power state 
is more difficult than for exclusive on-off devices but follows 


a straightforward principle. For the above example, device 2 
is off for the first si + 1 = 3 states, i.e.. Si.. 3 ^ = 0 or more 
generally S1...3 <j>2 = 0- F° r the states k = 4... 6 device 2 runs 
with its first power value, i.e., S 4 ... 6,2 = 1 and so forth. The 
power value of the last state is 

N 

Pm = Ptotal = X! Pd d 

d=l 


which is the highest possible one. The occupation number c 


is estimated from the set of power values according to ( 111 . 

The multiple possible device states require a modification of 
the state probabilities pk■ The device probability is written in 
the same way as the power values so that p s d is the probability 
that device d is running on power value s. The state probability 
is than gained by 


N 


pk = n p,i 


Skd 


(25) 


with usage of the device state matrix. The off-state probability 
p d needs to be calculated by 


Pd = 1 - E Pd 
3=1 


as it is required within this notation. The notation introduced 
for multi-state devices is more general and includes the two 


state devices from section III The average device probability 


p is not an equal likelihood assumption. The assumption is 
on the likelihood of the off-states, the other device states 
are equally likely. The state probability is further used to 
estimate entropy & power values probability ( p0| and mutual 
information © just as in the case of on-off devices. 



Fig. 6. Occupation numbers for power values of the three artificial device 
sets. 

To demonstrate the influence of multi-state devices we 
compare three artificial device sets. The multi-state device sets 
are based on the device set B2 which is constructed as set B but 
with the parameter a = 2, i. e., all states have the occupation 
number 1 what makes it trivial as visible in figure [6] Both 
derivative sets have 9 additional states. For set B2+ device 10 
has 9 additional states with the power values of devices 1 to 
9. For set B2x the last nine devices have a second state with 
the power value of the previous device. The derivative sets 
have the same power values but they are differently distributed 
among the devices. The reference values for several artificial 
device sets are listed in table [V] Figure [6] shows the occupation 
numbers for the power values. 
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Device Set 

S 

M 

Hmax 

I p 

-‘-max 

Cmax 

c 

B 

10 

1024 

10 

8.04 

0.80 

3.6 

B2 

10 

1024 

10 

10 

1 

1 

B2+ 

19 

5632 

12.46 

9.6 

0.77 

5.5 

B2x 

19 

39366 

15.26 

9.8 

0.64 

38.5 


TABLE V 


Several parameters for the artificial sets of ten devices. 



p P P 


Device Set 

Power States 

GreenD 1 

[55 140 240], [1220], [60 148 470 570 1225 1265], [1790], 
[70 155 210 260 423 1898], [40 1900] 

GreenD 2 

[60], [80], [850], [1580], [80 1725], [90 173 1910] 

GreenD 3 

[110 235 285 360], [120 1235], [55 125 540 882 1047 
1220 1630], [70 2002], [125 245 358 1998 2100], [70 160 
2358 2550] 

RedD 1 

[200 420], [50 210 410 890 1115], [260 710 1440], [55 
110 270 300 620 1405 1505], [1680 2478], [2705] 

RedD 2 

[123], [410], [160 420], [130 210 770], [1050], [40 1718 
1850] 

RedD 3 

[100 400], [210 525 730], [40 365 900 1220 1520], [860 
960 1285 1605], [120 540 1698], [2265] 

Eco 1 

[40], [72], [250 440 785], [50 1225], [1800], [90 180 250 
365 2168] 

Eco 2 

[70], [55 175], [80 185], [50 310], [50 1840], [120 2132] 

Eco 3 

[100], [120], [130], [100 175 280], [40 1365 1485], [67 
190 280 445 650 785 1065 1545] 


TABLE VI 


Power values of the device sets according to fJTl . 


Fig. 7. Entropy H and mutual Information I p as a function of the device 
probability p for the three artificial device sets. The horizontal lines mark the 
values for the maximal entropy case. 


Entropy and mutual information are shown for different 
device probabilities in figure [ 7 ] including the values for the 
maximal entropy case. The maximum of the entropy curve for 
set B2, which is reaching H max , shifts due to the additional 
power values in the extended sets. In set B2x most devices (9 
of 10) have two power values which is equal to three states. 
For nine devices the equal distribution of states is equivalent to 
the device probability of p- 2/3 which is where the maximum 
occurs. For set B2+ the entropy function does not reach H max 
(depicted as horizontal line). This is due to differences in the 
number of power states between the devices. While device 10 
reaches equal state distribution in p « 0.9 all other two-state 
devices reach it at 0.5. In other words device 10 is involved in 
many of the possible states but is not operated more frequently 
to the same extent. The additional states in the derivative sets 
significantly increase the entropy while the mutual information 
is actually decreasing. This is caused by the constant total 
power Ptotai of the three device sets. 

V. Case study on real device sets 
We apply the measures developed within this paper to 
realistic device sets. We chose data sets frequently used for 
test cases within load disaggregation studies. Such as the 
GreenD ll28ll . the RedD ll29l and the Eco lf30l dataset as 
used in ED, El. To ensure comparability we use exactly the 
same six appliances for each house as quoted as submetered 
power values in eh. The power states of the appliance set 
were detected by an algorithm presented in there. For further 
information, e. g., how to extract appliance state information 
and the choice of appliances, we refer to ED- 

All the parameters shown in table |VII| result directly from 
power values of the devices of table |Vl| The values are 
presented within figure [8] which shows the houses ranked 
according to their number of states, and in figure [9 in which 
the set is sorted by descending proficiency. Figure 8] depicts 


Device Set 

S 

M 

Hmax 

I p 

x max 

Cmax 

c 

GreenDl 

• 

26 

2352 

11.2 

10.21 

0.91 

1.23 

GreenD2 

■ 

15 

192 

7.59 

7.20 

0.95 

1.10 

GreenD3 

♦ 

30 

10800 

13.4 

11.69 

0.87 

1.76 

RedDl 

• 

26 

3456 

11.75 

10.72 

0.91 

1.18 

RedD2 

■ 

17 

384 

8.59 

8.4 

0.98 

1.94 

RedD3 

♦ 

24 

2880 

11.49 

10.04 

0.87 

1.67 

Ecol 

• 

19 

576 

9.17 

8.84 

0.96 

2.24 

Eco2 

■ 

17 

486 

8.92 

7.86 

0.88 

1.79 

Eco3 

♦ 

23 

1152 

10.17 

8.97 

0.88 

2.57 


-TABLE Vll- 

Several parameters for the nine sets of N = 6 devices. 
Comparison is shown in the figures[8]and[2] 


entropy and mutual information for the maximal entropy case 
and characteristic power values, i. e., the total set power Ptotai 
and average device set power P av of the sets in kilowatt. P av 
is the expected average value when all devices are turned 
on. It is calculated by getting the average power for each 
device (Pd) = 1 /sdT, s Pd anc * t ^ en averaging the device set 
Pav = 1 /NY, d (Pd)- Statistically data sets with more states 
are expected to have higher total power values. GreenD2 and 
Eco3 are exceptions which leads to the conclusion that the 
bias of device selection can be significant. Conclusions about 
the number of states by the total power is inappropriate for 
individual device sets. The average set power is between 50 
and 75 % of the total set power. 

Plot a) of figure [9] contains the proficiency for maximal 
values (filled) and the proficiency for low device probability 
(empty) with p = 0.1. The latter one is obviously closer 
to one and is a sample of figure 11 Device usage rates 
have a significant influence on proficiency. Plot b) shows the 
average occupation numbers c of power states. In general it 
increases with decreasing proficiency. The real device sets 
all yield between 1 and 3, way below the values from the 
artificial sets A and B as listed in table [V] and [II] The average 
occupation number measures average equality of power values. 
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b) 



Fig. 11. The proficiency C changes with the averaged device probability. 
Fig. 8. Plot a) shows maximal entropy (filled markers) and the related mutual The function is mainly related to the distribution of power values among the 
information for all device sets. Plot b) shows the Ptotal (filled) and the devices, 
average of power states (empty markers) from all devices. 


■a 6 i 

■ o 

• □ 

a) 3 

2.5 

1 1 1 1 1 1 1 1 1 

♦_ 

■ on 

• • 

<o 2 

♦ 

♦ 

* • ■ 

♦ ■ . 

1.5 

- 

♦ ♦ 

_l_1_1_1_1_1_1_1_L 

1 

_! _1_1_1_1_1_1_1_!_ 


0.95 
O 
0.9 

0 85 

r V\ r V\\°5 r V'l>'l> ~ r V\ r V\\°5 r V a 5°3 


b) 


three RedD sets get close to 1 for low p which means that 
the power values with view involved devices are generally 
distinguishable. The device sets RedD3 and GreenD3 have 
lowest C-values at comparative high p which means the 
indistinguishable power values include many devices, making 
them less likely to occur. The GreenD2 set is special, as 
proficiency is barely influenced by p. It is ranked lowest 
according number of states, i. e., M or H max in figure [9}i, 
while for low p < 0.1 proficiency is smaller than for GreenD3, 
which is uppermost in figure [9ji. 


Fig. 9. The proficiency C is shown in a). Filled markers show the maximal 
entropy case, empty ones the values for p = 0.1. Plot b) shows the average 
occupancy number c. 


The appliance set complexity (AC) from ED is a measure 
for similarity of power values (without considering their 
likelihood), which includes similarity. If the distribution of 
modeling- and measurement errors, which is assumed to be 
normal in EQ is of Delta type, the AC is expected to match 
the value of c for a specific device set. Values for AC are 
therefore always above c. 

As demonstrated in section IV entropy and proficiency are 
a function of device operation probability. Figure 10 shows 
entropy and mutual information for each device set grouped 
by the three data sets GreenD, RedD and Eco. The maximal 
entropy values are depicted by horizontal lines. In figure [TT| the 



Fig. 10. Entropy and mutual information are shown for all the data sets, the 
horizontal lines mark the respective values for the maximum entropy case. 

proficiency for the values presented in figure [lOjis plotted. The 
device sets react differently to varying device probability. The 


VI. Discussion 

Load disaggregation is the decoding process within an infor¬ 
mation communication problem. The code depends exclusively 
on device attributes and their representation in the power draw. 
Entropy, as a measure for the amount of initial states (equaling 
possible device configurations), has the advantage that it adds 
up in case of two merging device sets. This is generally not 
true for the mutual information of power values, which is an 
entropy type measure as well. The values for the maximal 
entropy case are a bound for more realistic cases that include 
the probabilities of devices to run and the power values, 
respectively. Proficiency gives the fraction of information 
about the device states which can be reproduced from the 
power values. It therefore might qualify as an upper bound for 
detection rates of NILM algorithms. To show to which extend 
this is true would require the evaluation of a NILM algorithm 
with a considerable set of power draws. Further it is necessary 
to define mutual information for continuous power values with 
respect to the signal to noise ratio. 

A set of power draws could be used for follow up projects. 
The estimation of the single device power values by analysis 
of the power draws histogram would improve unsupervised 
NILM. Furthermore the assessment of device probabilities by 
simple measures. For a specific power draw the total consumed 
average power in relation to the total power of the device set 
allows to estimate the average device run times. The proportion 
of time steps without any running device indicates similar 
reasoning and is as easy to estimate. 

The concepts developed in this paper can be extended to any 
parameter space with other attributes as used in j33). Those 
can but do not necessarily include power values. The concepts 
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Fig. 12. The average device operation probability correlates with the average 
power of the resulting time series. The likelihood of the zero power value 
decreases logarithmically with increasing average device probability. 


help to decide if there are more promising attributes of the 
power draw to distinguish scenarios, or whether a single device 
can cause difficulties. 


VII. Summary 

We have modeled load disaggregation as a decoding process 
within an information communication problem. Description 
and improved understanding of the respective coding process 
helps in decoding. If power values are used for NILM the 
coding scheme is likely to be not entirely bijective as not all 
possible device configurations are mapped to distinguishable 
power values. We have established the calculation of entropy 
of initial device states, mutual information of power values 
and the resulting uncertainty coefficient or proficiency. We 
demonstrated that the proficiency is highly dependent on the 
device running probability, especially for devices with multiple 
values of power consumption. We used artificial exemplary 
device sets as well as real measured values of devices that 
were repeatedly used for other load disaggregation studies to 
demonstrate the meaning of these parameters. The insights 
on the coding procedure from device states to aggregated 
power values contributes to the improvement of existing NILM 
algorithms. 
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